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Is Gene Duplication a Viable 
Explanation for the Origination of 
Biological Information and 
Complexity? 



All life depends on the biological information encoded in DNA with which to syn- 
thesize and regulate various peptide sequences required by an organism's cells. 
Hence, an evolutionary model accounting for the diversity of life needs to demon- 
strate how novel exonic regions that code for distinctly different functions can 
emerge. Natural selection tends to conserve the basic functionality, sequence, and 
size of genes and, although beneficial and adaptive changes are possible, these serve 
only to improve or adjust the existing type. However, gene duplication allows for a 
respite in selection and so can provide a molecular substrate for the development of 
biochemical innovation. Reference is made here to several well-known examples of 
gene duplication, and the major means of resulting evolutionary divergence, to 
examine the plausibility of this assumption. The totality of the evidence reveals 
that, although duplication can and does facilitate important adaptations by tinker- 
ing with existing compounds, molecular evolution is nonetheless constrained in 
each and every case. Therefore, although the process of gene duplication and subse- 
quent random mutation has certainly contributed to the size and diversity of the ge- 
nome, it is alone insufficient in explaining the origination of the highly complex in- 
formation pertinent to the essential functioning of living organisms. © 2010 Wiley 
Periodicals, Inc. Complexity 16: 17-31, 2011 

Key Words: gene duplication; biological complexity; evolutionary divergence; com- 
pensatory mutation; conservation of information 

1. INTRODUCTION 

1.1. The Efficacy of Natural Selection 

One of the singular issues in molecular biology and evolution concerns the ori- 
gins of the distinct exonic sequences and motifs that contribute to the func- 
tionality of the genome and to organismic complexity. Indeed, the cause of 
such a huge proliferation of genetic information, coding for polypeptides as small 
as 49-residue echistatin to those such as titin, a gigantic protein found in muscle 
tissue and consisting of over 30,000 amino acids remains an elusive and unsolved 
problem in the study of biological origins. It is presumed that the genes of all 
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extant and extinct species have evolved 
from a life-form with a protogenome 
[1]. Natural selection per se is a poor 
candidate to explain such an evolution 
of complexity [2], as it is disposed to 
conserve the existing structure and or- 
ganization of genes, and their essential 
information content, resulting in func- 
tional stasis [3] . 

Research into the evolution of genes 
has shown that the peptides they code 
for are of a finicky and precarious 
nature, both marginally stable and 
prone to aggregation [4]. Protein folding 
happens to be a highly complex and 
synergistic process, involving a number 
of epistatic relationships among many 
residues. This phenomenon, com- 
pounded with the issue of interactions 
between protein molecules, can signifi- 
cantly complicate adaptive evolution 
such that in the majority of cases the 
overall effects on reproductive fitness 
are very slight [5, 6]. Many arguably 
"beneficial" mutations have been 
observed to incur some sort of cost and 
so can be classified as a form of antago- 
nistic pleiotropy [7]. 

Indeed, the place and extent of natu- 
ral selection as a force for change in 
molecular biology have been ques- 
tioned in recent years [8] . Detecting the 
incidence of any beneficial substitu- 
tions in genes has so far relied on statis- 
tical inferences as empirical evidence is 
less readily available. In many instan- 
ces, nonsynonymous changes and shifts 
in allelic diversity may be induced by 
factors that can serve to imitate selec- 
tive effects — biased gene conversion, 
mutational and recombinational hot- 
spots, hitchhiking, or even neutral drift 
being among them [9]. Moreover, sev- 
eral well-known factors such as the 
linkage and the multilocus nature of 
important phenotypes tend to restrain 
the power of Darwinian evolution, and 
so represent natural limits to biological 
change [10]. Selection, being an essen- 
tially negative filter, tends to act against 
variation including mutations previ- 
ously believed to be innocuous [11]. For 
example, PABPC1 is a polyadenylate- 



binding protein used in translation ini- 
tiation in both humans and mice [12]. 
Although there are 92 nucleotide differ- 
ences in the translated region of the re- 
spective orthologous genes, these are all 
synonymous except in just two codons 
where Asp has been replaced at residue 
209 with Glu and Thr with Ser at residue 
576 — both similar amino acids. How- 
ever, it is also clear that the gene's role 
is essential and that any functional 
divergence in this particular case is 
unnecessary. 

1.2. Duplication as a Potential 
Driving Force Behind Molecular 
Evolution 

Gene duplication offers the prospect of 
a respite from stringent purifying/neg- 
ative selection [13]. This is because 
only one gene locus needs to be func- 
tional, meaning that any paralog is 
freer to diverge allowing for changes, 
promoted by near neutral drift, which 
would not normally be tolerated in the 
case of a singleton. It is thought that 
suboptimal and deleterious changes 
may become fixed and accumulate 
through a more permissive regime of 
selection [14], such that they fortui- 
tously combine to produce a novel 
adaptive function. However, any evolu- 
tionary development must be tem- 
pered by the impact of any changes on 
protein structure and stability [15] and 
not just the peptide sequence itself. 

Although it may be inefficient and 
costly for the cell to produce identical 
surplus proteins, and which can lead 
to cases of unwanted overexpression 
and harmful phenotypes [16], this can 
also prove to be beneficial by provid- 
ing a useful double dosage [17]. Simi- 
larly, the role of duplicate genes in 
facilitating alternative metabolic path- 
ways and regulatory interactions [18] is 
another important factor. 

Duplication, including instances of 
intragenic amplification, can occur by 
way of unequal chromosomal cross- 
overs, the retropositioning of spliced 
mRNA, and copying of a whole chro- 



mosome or even an entire genome — 
the persistence of entire gene networks 
helps to explain the presence of poly- 
ploidy in plants [19]. 

However, genomic studies have 
revealed that active duplicates may 
nonetheless be selected for their 
redundant utility [20], as they can 
serve as backups when a mutation 
inflicts damage to a sister site [21, 22]. 
This means that any changes made to 
them are liable to be selected against if 
they impair this masking ability and its 
contribution to genomic robustness. 
This may explain, in part, the huge 
effect of duplicates in shaping both 
prokaryotic and eukaryotic genomes, 
and their evolutionary preservation 
[23]. Another phenomenon involved in 
the retention of duplicate genes is 
"subfunctionalization," namely the dif- 
ferential partitioning of function or 
expression [24]. Here, redundant func- 
tions will degenerate at random from 
the daughter copies until their joint 
function matches that of the parent 
gene [25]. 

Were selection to be completely 
relaxed and any manner of changes 
permitted, this would only serve to 
guarantee complete degeneration. It 
would invariably lead to the introduc- 
tion of null and nonsense mutations, 
scrambling the open reading frame 
(ORF), and degrading the cisregulatory 
elements involved in transcription — 
leading to the gene's pseudogenization. 
Thus, a measure of purifying/stabiliz- 
ing selection seems necessary for 
duplicate preservation, and any evolu- 
tionary divergence would proceed 
under a relaxed regime rather than 
none at all. 

Moreover, in terms of population size, 
Kimura's diffusion approximation [26] 
makes it abundantly clear that in diploid 
populations of a normal size, typically for 
those of N > 10,000, even the slightest 
degree of negative selection is sufficient 
to prevent any deleterious allele from 
surviving and increasing in frequency to 
the point of fixation or near fixation. This 
would mean that any major changes in 
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both gene singletons and duplicates alike 
would tend to occur in smaller popula- 
tions, where drift is much stronger and 
selection is weaker. 

2. THE EVOLUTION OF GENETIC 
INFORMATION 

The purpose of this study is to deter- 
mine the existence and extent of any 
novel information produced as a con- 
sequence of gene duplication. At stake 
is whether there is sufficient support- 
ing evidence that the digitally commu- 
nicated instructions [27] encoded in 
DNA could have been constructed 
through known evolutionary processes, 
or whether the data suggests that an 
alternative explanation is required as 
in all codified nonbiological informa- 
tion. Therefore, this would serve as a 
means of assessing the current argu- 
ments regarding the origins of biologi- 
cal and genomic complexity. 

2.1. The Information Conundrum 

Although the nucleotide sequences in 
DNA are commonly understood to 
carry/convey biological "information" 
[28], a precise scientific delineation for 
the term in the context of genetics is 
often found to be lacking. Therefore, it 
is impossible to test any hypothesis 
regarding the creation of new genetic 
information without offering at least a 
conceptual definition of what informa- 
tion means and what the criterion is 
for identifying it. In Shannon's theory 
[29] of communication, information is 
termed the "reduction in uncertainty," 
where entropy is the measure of any 
stochastic dependencies — the greater 
the level of uncertainty that exists in a 
particular situation, the less likely it is 
to predict the behaviors and outcomes 
because of the presence of random 
noise. Therefore, information is that 
which denotes a degree of determin- 
ism in a known relationship, although 
this would also have to involve a large 
measure of contingency to permit as 
many possible combinations to be 
conveyed. In the framework of molec- 



ular biology, information would refer 
to the inherent functionality of gene 
products: i.e., how they interact with 
the biochemical environment in which 
they operate. 

Therefore, I have decided to define 
any gain in exonic information as: 
"The qualitative increase in opera- 
tional capability and functional speci- 
ficity with no resultant uncertainty of 
outcome." The two parts of the state- 
ment are complementary, because an 
appreciably great degree of specificity 
is required to reduce any uncertainty 
and problems regarding behavior and 
effect: this is especially true in the case 
of enzymes that catalyze only particu- 
lar reactions, and to the exclusion of 
all others. A random mutation in the 
active site could well lead to an "ad- 
vantageous" outcome in a particular 
environment owing to a shift in cata- 
lytic activity. However, the evidence 
suggests this would entail an alteration 
in the particular specificity pattern 
[30]. Therefore, it would mean that an 
increase of uncertainty and more er- 
ratic behavior, with respect to the over- 
all and net effect(s), is a consequence 
of such a development. 

2.2. The Relationship Retween 
Sequence, Function, and Evolutionary 
Divergence 

Usually, it is safe to say that homologs 
share basically the same function and 
that many changes in sequence are 
not consequential. However, this is 
very much a general rule. A single 
amino acid replacement in a carboxyl 
esterase in blowflies confers organo- 
phosphorus insecticide resistance [31], 
although this is because of a loss in 
the primary enzymatic activity. Many 
synonymous changes have indeed 
been identified with codon usage bias, 
contributing to splicing and transla- 
tional efficiency [32]. A study has 
found that there exists a threshold at ~ 
50% sequence similarity below which 
functional divergence is enhanced [33]. 
Orthologs performing the same func- 



tion should be under the same selec- 
tive constraints and evolve at the same 
rate. But in the case of paralogs, there 
is a relaxation of purifying selection, 
and distinguishing loss of constraint 
from rapid evolution driven by adapta- 
tion is difficult because the loss of 
constraint often precedes any potential 
neofunctionalization [34]. 

2.3. Testing for the Role of Natural 
Selection in the Creation of Novel 
Functionality 

Detecting the effect of Darwinian posi- 
tive selection — whereby an allele is 
supposed to increase in frequency 
because it confers a reproductive 
advantage — is not an exact science by 
any means, and it relies on statistical- 
based inferences that leave much to 
interpretation. Even if adaptive muta- 
tions have been prominent in a gene, 
it is not accurate to necessarily infer 
that any new functionality has arisen. 
All it means is that an allele has con- 
tributed to a gain in reproductive fit- 
ness, and nothing beyond that. In 
many instances, as with the example 
above, a loss of function and regula- 
tion in a harsh or unusual environ- 
ment can have a beneficial outcome 
and thus be selected for — bacteria tend 
to evolve resistance to antibiotics in 
such a way through mutations that 
would otherwise adversely affect mem- 
brane permeability [35]. The magnifi- 
cation of the importance of one or 
more loci is tantamount to artificial 
selection, but occurs in some cases 
during drastic environmental catastro- 
phes, where a single trait might make 
a difference between survival or not. 

Population genetics methods typi- 
cally involve measuring levels of heter- 
ogeneity and polymorphism at sites 
including and in proximity to the one 
under investigation [36]. It can lead to 
confusing results because the effects of 
Darwinian selection are often the same 
as those of background selection — the 
purging of neutral alleles due to their 
spatial proximity to deleterious ones 
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[37]: the case of the gene implicated in 
microcephaly likely being a controver- 
sial example of this [38]. Sequence 
alignment methods are preferred, 
especially where data from a sample of 
a population is not available. As such, 
three ratios were determined and used 
throughout to detect the probability of 
functional change [39]. 

i. The ratio of nonsynonymous to 
synonymous substitutions, dN/dS 
(to), is regarded as the most 
obvious indication of adaptive 
change and functional shift [40]. In 
the case of neutral evolution, it 
would be around 1:1, but the pro- 
portion is skewed in favor of the 
former if positive selection is prev- 
alent, whereas purifying selection 
is inferred when this is reversed. 
When comparing singletons in dif- 
ferent phylogenetic lineages, this is 
a very powerful method, but in the 
case of duplicates more caution is 
required. As has been previously 
mentioned, there is an appreciably 
relaxed regime of selection in 
paralogous genes because only one 
need maintain the original func- 
tion(s). As such, the rate of nonsy- 
nonymous substitutions may be 
much higher, not on account of 
adaptive evolution, but because 
purifying selection is far less strin- 
gent than it is for singletons. 

ii. The transition to transversion ratio, 
ts/tv (k), is also a useful test. 
Although there are twice as many 
possible transversions as there are 
transitions, the molecular mecha- 
nisms by which they are generated 
means that transitions (e.g., purine 
to purine) are more frequent than 
transversions (e.g., purine to pyrim- 
idine). Notwithstanding mutational 
bias, the ratio can be seen as evi- 
dence for adaptation if the trans- 
versions greatly exceed transitions 
[41]. 

iii. The ratio of radical to conservative 
replacements, K R /Kc, is a measure 
of the nature of the evolutionary 



changes in peptides. As many 
amino acids are chemically similar, 
they may also be relatively inter- 
changeable — as with Val, lie, and 
Leu — and so can be regarded as 
essentially neutral substitutions. 
Therefore, dN/dS may not reflect 
the significance of any divergence. 
If K R /Kc is >1, then this could be 
suggestive of the fixation of benefi- 
cial mutations. However, such is 
the nature of context specificity 
within protein domains that a sub- 
optimal but still conservative 
replacement at one site could 
require a compensatory [42] and 
more radical change at another. 
Although widely used, the method 
has been criticized for being too 
simple and shows nothing about 
actual changes in the behavior of 
the protein [43]. 



2.4. Aims of Investigation and 
Materials Used 

Several familiar and exemplary cases 
of evolution following an initial gene 
duplication were chosen and catego- 
rized according to known mechanisms 
of divergence that include fusion, fra- 
meshift mutations, retroposition, inter- 
nal amplification, and de novo recruit- 
ment. There is, of course, considerable 
overlap between these various mecha- 
nisms, although the primary focus is 
different for each case. The scope and 
remit of the investigation was limited 
to exonic sequences within the trans- 
lated regions, thus largely avoiding reg- 
ulatory areas and introns, where retro- 
transposon insertions are believed to 
be significant [44]. Although gene reg- 
ulation and expression are important, 
it is the regions that code for protein 
sequences that comprise by far the pri- 
mary source of biological information. 
All pertinent sequence data, both nu- 
cleotide and amino acid, were down- 
loaded from the NCBI database and 
taken from where it is cited in the rele- 
vant literature. Standard alignment 



techniques for analyzing and illustrat- 
ing the data were done using BLAST, 
with more advanced pair-wise ones 
using the ClustalW2 algorithm together 
with Emboss. 

3. ANALYSIS OF GENE 
DUPLICATION BY EXAMPLE 

3.1. Duplication and Gene Fusion: The 
Case of Sdic 

Sdic is believed to be a flagellar dynein 
gene found only in Drosophila mela- 
nogaster — an example of a tandem 
duplicated chimeric gene "caught in 
the act" of evolving [45]. It was formed 
when two adjacent genes, AnnX 
(coding for a cell adhesion protein) 
and Cdic (encoding a cytoplasmic in- 
termediate chain dynein), were first 
duplicated and one pair subsequently 
underwent a deletion-mediated fusion. 
Sdic is found to be composed of four 
paralogs having itself been duplicated 
twice over. The 5' untranslated region 
(UTR) and part of the promoter 
sequence of the gene derives from 
AnnX, whereas the translated part and 
all 300 base pairs (bp) of the 3' UTR 
come from the Cdic gene. A sequence 
comparison of Sdic2 and Cdic reveals 
that 522 out of 527 residues (99%) can 
be aligned without difficulty. Sdic has 
been observed to be expressed in the 
testes and incorporated into the sperm 
tail and this is because it has acquired 
a testis-specific core element, homolo- 
gous with those of other promoter 
sequences, from the 3'UTR of AnnX 
[46]. It is unclear whether the element 
is a translational enhancer or has 
some other regulatory role in the AnnX 
gene such as, for example, in mRNA 
localization. Either way, the gene 
would seem to contribute to greater fe- 
cundity. 

But, it is the loss of over 100 codons 
from Cdic's N-terminus [47], involving 
at least two domains, that deprive the 
Sdic protein of the motifs necessary to 
enable it to interact with dynactin (a 
basic characteristic of cytoplasmic 
dyneins) and which represent the prin- 
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cipal functional shift. Thus, Sdic is axo- 
nemal almost by default owing to the 
mass deletion of exonic information 
pertinent to cytoplasmic-specific oper- 
ation (Figure 1). The gene's promoter 
has simply acquired features from pre- 
existing coding sequences and infor- 
mation present in AnnX, whereas its 
translated region is virtually identical 
with the corresponding part of Cdic. 

The distal and proximal conserved 
elements are also found to be very 
similar to those of the Cdic promoter. 
In addition, the 16 codons present at 
the N-terminus of Sdic, recruited from 
Cdic's third intron along with an 1 1 bp 
insertion, bear a tenuous resemblance 
to the amino ends of axonemal inter- 
mediate chain dyneins such as those 
for oda6 and AclC3 [48]. It is reasona- 
ble to assume that this small amount 
of exonization, allowing a previously 
noncoding region UTR to become the 
start site and initial part of the Sdic gene, 
is adaptive. As such, this could be inter- 
preted as evidence for the de novo crea- 
tion of novel information. 

Further evidence for the role of 
selection in the development of Sdic 
includes a possible sweep found in the 
low levels of polymorphism across 
neighboring loci and a skewed fre- 
quency distribution of allelic variation. 
However, it is noted that a reduced level 
of heterozygosity in a region of low 
recombination, such as at the base of 
the X-chromosome where Sdic is 
located, is also consistent with back- 
ground selection because of the effect 
of deleterious mutations [49]. Both 
analyses could in fact be correct. 
Although the number of nonsynony- 
mous differences is greater than synon- 
ymous ones, as would be expected in a 
basic test for adaptive evolution, this is 
due to a bulk deletion and resultant fra- 
meshift occurring in the fourth domain 
(inherited from Cdic) that produced a 
string containing at least five novel 
characters. As this domain is believed to 
be nonfunctional in Sdic, it is more logi- 
cal to infer the existence of a relaxed re- 
gime and decrease in selective con- 



straints, than to assume any adaptive 
change. Therefore, the initial loss of in- 
formation at the N-terminus because of 
relaxed selection was then compen- 
sated for by the recruitment of sequen- 
ces from an intron of Cdic and the 
exons of AnnX. In this way, a nonfunc- 
tional cytoplasmic dynein "evolved" 
into an axonemal one through a process 
of copy, cut, and paste. 

Divergence between the Sdic paral- 
ogs themselves has been very limited 
such that the translated regions of 
Sdic2 and Sdic4 actually share a 100% 
nucleotide identity and are function- 
ally redundant. Although the gene is 
considered to be young, and formed 
within the last 2-3 million years, the 
short generational span of the fruit fly 
(~ 2 weeks) means that the evolution- 
ary timescale may actually be rather 
long (~ 50 m generations). 

It is possible that Sdic contributed to 
speciation and the emergence of the 
melanogaster line [50]. The most likely 
scenario involves a population bottle- 
neck, migration, or founder effect [51]. 
Any reduction in effective population 
size would also produce a further relax- 
ation of selective constraints as (nearly) 
neutral drift would predominate. 

It appears that deletion in this 
instance was one of the necessary fac- 
tors involved in gene fusion. As such, 
Sdic is shorter than Cdic, and this is 
true also for the hominoid oncogene, 
TRE2, which is 200 residues less than 
one of its parents, USP32 [52]. This 
presents a problem in terms of 
explaining any accretion of cistron size 
with reference to the most naturally 
applicable evolutionary process. Dele- 
tion-mediated fusion also means that 
usually one of the genes is far less pre- 
served than the other but in the case 
of Kua-UEV, however, the effect is 
additive because it has retained the 
original and separate functions of both 
its parents [53]. Although it may 
behave slightly differently, particularly 
with respect to intracellular localiza- 
tion, the information content has not 
appreciably changed. 



3.2. Duplication and Frameshift 
Mutation 

Already briefly mentioned in the previ- 
ous section, another potential means 
by which new genes, with new exonic 
information, might arise is by way of a 
frameshift resulting in an entirely dif- 
ferent ORF and peptide sequence. A 
case of just such a development was 
proposed by Ohno [54] in the case of a 
nylon oligomer hydrolase found in 
bacteria near sites involved in the pro- 
duction of the synthetic material. How- 
ever, a study by Negoro et al. [55] 
found that the likely source was 
actually an esterase containing a [3-lac- 
tamase fold. Two amino acid replace- 
ments in the catalytic cleft greatly 
increased the Ald-hydrolytic activity, in 
some measure already provided by a 
serine active site, necessary for the 
degradation of the oligomers. However, 
this does appear to have come at some 
cost to part of the esterolytic function 
and the enzyme does not have nearly 
the specificity constant and efficiency, 
with respect to its alternative function- 
ality, of a hydrolase such as aminoacy- 
lase [56]. Therefore, although there is 
an appreciable gain in operational 
capability, no new information was 
generated that specified oligomer deg- 
radation. 

Scherer and coworkers [57], using a 
search on BLAST, found that as many as 
470 duplicated genes in humans had 
been affected by frameshift translation. 
However, frameshifts induced either by 
indels or by transposons (mobile ele- 
ments) are themselves poor candidates 
for the generation of novel information 
because they almost inevitably incur pre- 
mature stop codons [58], leading to pro- 
tein truncation, in addition to scrambling 
part of the original reading frame. This is 
indeed evident in some of the genes pre- 
sented in their study. HTR3D is a hy- 
droxytryptamine (serotonin) receptor in 
humans, which is essentially the carboxyl 
terminus remnant of HTR3C. However, 
owing to the inherent modularity of a 
gene, the truncated daughter copy has 
retained at least part of the parental func- 
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FIGURE 1 



CDIC 


MDRKAELEFJCKAKIJUU^EKDRFJUffiKEIKDMEEAAGRIGGGAGIDroQRKDIIJE^ 


100 


SDIC 


MGLVLIKTLRSIYSTL 


Olfi 


CDIC 


SGGKKQPLNLSVYMVaATNI PPKETLVYTKSTCjTTSTGGGHGDAHATDYYDEYNLNPGLEWEDEFTGDDEE S SLQHLGNGFTSKLPFGYLTHGLPTVKDV 


200 


SDIC 


SGGKKQPLNX.SVYNVQATNIPPKETLVYTKQTQTTSTGGGNGD VL-AFDAQ-GDDEESSLQHIGNGFTSKLPPGYITHGLPrVKDV 


100 


CDIC 


APAITPLETKKETEVKKEVNELSEEQKQMIILSENFQRFVVRAGRVIERALSENVIJTYTDYIGGGDSEEAKDERSflARLSLNRVFYDERWSKNRCITSMD 


300 


SDIC 


AFAITPLEIKKETEVKKEVNELSEEQKQMIILSENFQRFVVFAGRVIERALSENVDIYTDYIGGGDSEEAMDERSEiARLSLNRVFYDERWSKNRCITSMD 


200 


CDIC 


WSTHFPELVVGSYHHNEESPtffiPDGVVMVWNTKFKKSTPEDVFHCQSAVMSTCFAKFNPNLILGGTYSGQIVIjWDNRVQKRrPIQRTPLSAAAHTHPVYC 


400 


SDIC 


WSTHFPELVVGSYHHWEESPMPDGVVMVWNTKFKKSTPEDVFHCQSAVMSTCFAKFNPKLILGGTYSGQIVLVroNRVQKRTPIQRTPLSAAAHTHPVYC 


300 


CDIC 


LQMVGTQMAHKVISISSDGKLCSWSLDMLSQPQDTLELQQRQSKAIAITSKAFPANEINSLVMGSEDGYVYSASRHGLRSGVNEVYERHLGPITGISTHY 


500 


SDIC 


LQMVGTCjHAfltTVISISSDGKLCSWSLDMLSQPQDTI^LCXiRCjSKAIAITSl^ 


400 


CDIC 


HQLSPnFGHLFLTSSIDOTIKLMSLKDTKPLYSFEDKSDYVMDVAWSPVHPALFAAVDGSGRIBL™iMQDTEVPTASIWAGAPALfIRVSHTPSGI.HVC 


GOO 


SDIC 


HQLSPDFGHLFLTSSIDmiKLWSLKDTKPLYSFEDKSDYVMDVAWSPVHPALFAAVDGSGRIJiLWNraQDTEVPTASIVVAGAPALNRVSHTPSGLHVC 


500 


CDIC 


IGDEAGKLYVYDVAElsrLAQPSRDEWSRFHTHLSEIKMNQSDEV 


643 


SDIC 


IGDEAGKIYVYDVAENIAQPSKDEWSRFNTHLSEIKMNQSOEV 


543 



The alignment of Cdic and Sdic (2 and 4) reveals the virtual identity of the corresponding coding regions in the genes. The N-terminus of Cdic, consisting 
of 100 codons, is missing in Sdic, and this means that Sdic lacks the motifs necessary with which to interact with dynactin. An intronic recruitment at 
the amino end has led to the exonization of 16 codons, whereas another deletion downstream, this time involving the loss of 16 codons, is present 
within the fourth domain from the 5' end. This development has resulted in a frameshift that provided five novel characters in the sequence. 



tionality, whereas the rest has been 
essentially ignored by purifying selection. 
Protein truncation in duplicates can also 
occur by way of a nonsense mutation 
resulting in a premature stop in transla- 
tion: the G-type cyclin CCNG1, involved 
in the regulation of cell cycle kinases, is 
found to be missing an important 
"PEST" sequence at the C-terminus that 
is present in its paralog, CCNG2 [59]. 

The authors cite as one such exam- 
ple of a possible frameshift the gene 
SLC25A37, a member of the mitochon- 
drial solute carrier family. Indeed, an 
analysis reveals that SLC25A3 7 was cre- 
ated, as shown in Figure 2, as a result 
of a bulk deletion together with a sin- 
gle nucleotide insertion in a copy of 
the likely parent, SLC25A28— although 
the exact sequence of events cannot be 
determined. As a result of the frame- 
shift, 54 novel characters were gener- 
ated but 22 were also deleted, casting 
doubt on the biochemical importance 
of this resulting minisequence. This 
would suggest that despite the extent 
of nonsynonymous differences evident 
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in SLC25A37, these are likely to have 
been the result of relaxed purifying 
selection rather than any beneficial 
increase in information. 

It would be useful to test for the 
effect of natural selection in the 302 
codons of the gene downstream of the 
frameshift and where the original read- 
ing frame has been restored. Accord- 
ingly, it was observed that 213 aligned 
residues were identical and that the ra- 
tio (co) of nonsynonymous to synony- 
mous base pair substitutions was 
greater than 1.0 (169:114). The ratio (k) 
of transitions to transversions was 
roughly equal (135:148), as was the ra- 
tio of radical to conservative amino 
acid substitutions (49:40). So, this 
would likely suggest that this reflects 
an overall structural realignment possi- 
bly to offset the radical changes and 
deletions at the N-terminus, rather 
than one representing any major func- 
tional shift. 

Evolutionary divergence by frame- 
shift mutation, and several other 
mechanisms, has also taken place in 



the FUT gene family in humans [60]. 
All but one of the nine genes are 
monoexonic and all code for the 
enzyme — fucosyltransferase — that 
transfers fucose on the terminal resi- 
dues of glycans, albeit on a different 
variety of substrates. FUT3 and FUT6 
are believed to be the most expressed 
members within the family and share 
a >90% nucleotide identity, displaying 
no discernibly significant functional 
differences. Both have diverged from a 
common ancestor, quite possibly FUT5 
itself, by way of a 40-bp deletion and 
resultant frameshift at the N-termi- 
nus — in much the same manner as the 
previous example. This is consistent 
with an inference for the relaxation of 
selective constraints and partial degen- 
eration followed by a suppressing 
mutation. 

Therefore, although frameshifts 
have the potential to cause more rapid 
sequence divergence than can individ- 
ual point mutations, it is wrong to 
assume that they can produce any 
novel information even if they do 
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FIGURE 2 



MELEGRGAGGVAGGPAAGPGRSPGESALLDGWL 
SLC25A28 atggagfctggaggggcggggfcgcfcggcggfcgfcggcgggggggccggc-ggcagggcccgggcggagccccggggagtcggcgctgctggacgggtggcfcg 99 

ME- - - -- LRSGSVG5QAVARRMDG D - -- -- -- - 

SLC25A37 atggag cfcgcgcagcgggagcgtgggcagccciggcggtggcgcggaggafcggafcggggac 60 

QRGVGRGAGGGEAGACRPPVRQDPDSGPDYEAL 

5LC25A2S cagcggggcgtgggccggggggccggcggcggggaggccggggcctgcaggcccccggtacgacaagatccggactccggcccggactacgaggcgctgc 199 

- — - — — — — — — — — - — — SRDGGGGKDATGSEDYENL 

SLC25A37 agccgagatggcggcggcggcaaggacgccaccgggtcggaggactacgagaacctgc 11 B 



RVSRDDATGSP-RAPSGS SRQD----------- 

FUT3 cgtgtgtcccgagacgatgccactggatccGcta--gggGtcGcagtgggtGctGccgacaggac 165 

RVSRDDATGSP RPGLMAVEPVTGAPNGSRCQD 

FUT5 cgtgtgtcccgagacgatgccactggatccccta-ggccagggcttatggcagtggaacctgtcaccggggctcccaatgggtcccgctgccaggacagc 202 

RVSQDDPTVY.PHGSRFPDSTG - -- -- -- -- -- - 

PUT 6 cgtgtgtctcaagacgatcccsctgtgtaccctaatggg-tcccgcttcccagacagcacaggg 166 

— — TTPTRPTLLILLWTWFFHIFVALSRCSEMVPA 
FUT3 occactcccacccgccccaccctectgatcetgctatggacatggcctttccacatccetgtggctctgtcccgctgtteagagatggtgcccgca 261 

SMATPAHP T L L I LLWTWP F N TPVAL PRCSEMVPA 
FUT5 catggcgacccctgcccaccccaccctactgatcctgctgtggacgtggccttttaacacacccgtggctctgccccgctgctcagagatggtgcccgcg 302 

---TPAHSIPLILLWTWPFHKPIALPRCSEMVPA 
FUT6 acccccgcGcactecatccccGtgatcctgctgtggacgtggccttttaacaaacccatagctctgcoccgGtgctcagagatggtgcctgca 259 

Divergence by way of a frameshifting event in SLC and FUT genes in Homo sapiens. The regions within each gene sequence affected by indels are 
shown above. In the case of the mitochondrial solute carrier gene, SLC25A28, a 16-nt deletion at the N-terminus has occurred in a duplicated copy 
of it. This alone would have truncated the gene into two separate reading frames: 1-252 and 252-1079. However the insertion of adenine at nt 
position 48 suppresses any gene fission and restores the length of the original reading frame, giving rise to SLC25A37. In the FUT genes, a combi- 
nation of deletions and at least one insertion in FUT3 and FUT6 caused a significant divergence in sequence from a common ancestor whose trans- 
lated region would have resembled FUT5. In both cases, the reading frame is altered for a short region, involving the loss of many codons, before 
being reconstituted donwstream and thus demonstrating a conservation of information. 



result in the emergence of novel char- 
acters within proteins. Therefore, a 
divergence in sequence need not result 
in a change in functionality or affect 
behavior, as the same information can 
be constructed using a number of dif- 
ferent amino acid arrangements. In 
duplicates, and also singletons, 
changes may be compensatory and in 
response to prior degeneration rather 
than representing any innovation. 

3.3. Gene Duplication and 
Retroposition: The Case of Jingwei 
and Adh 

Another gene of interest to researchers 
of molecular evolution, found in Dro- 
sophila yakuba and D. tessieri, is Jing- 
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wei (jgw). Like Sdic, it is a chimeric 
gene except that it was formed from 
the retropositioning (by reverse tran- 
scription) of one gene into the dupli- 
cated copy of another [61]. This consti- 
tutes a type of ectopic recombination, 
otherwise known as exon shuffling. 
The first three exons are considered to 
be derived from a duplicated copy 
(ynd) of a gene that is expressed 
uniquely in the testes (ymp). Therefore, 
the N-terminal domain of ynd has 
donated the non-Adh portion of jgw 
and this appears to be well preserved 
by purifying selection, indicative of the 
retention of functionality and also of 
the modular structure and organiza- 
tion of the gene [62]. 



Adh is an alcohol dehydrogenase 
that occurs in many organisms and 
facilitates the interconversion between 
alcohols and aldehydes. The retrose- 
quence of the gene was copied and 
inserted into the third intron of ynd 
and nine downstream exons of became 
pseudoexons, because transcription 
stopped at the terminating signal 
encoded in the Adh- derived exon. Ini- 
tially, this led researchers to believe 
that Jingwei was nothing other than a 
pseudogene, and its exact function is 
still unknown. As such, the first 68 co- 
dons of the translated region are 
derived from the ymp/ynd gene, 
whereas the remaining 255 are derived 
from the original 272 codons of the 
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translated region in the ancestral Adh 
gene. Betran [63] and others speculate 
that the number of nonsynonymous 
changes should be regarded as evi- 
dence for rapid adaptive evolution. 
Indeed, only 92 of the original 272 resi- 
dues remain (almost the minimum 
proportion to identify a homology), 
whereas the ratio (to) of nonsynony- 
mous to synonymous changes is 
astonishing — (332:58). The ratio (k) of 
transitions to transversions (154:236) 
and radical to conservative amino acid 
replacements (113:49) is an indicative 
too of a substantial functional shift. 
But is that really what has happened? 
Is there another explanation to 
account for this? 

Clearly, selective constraints have 
been relaxed as nine of the exons of 
the ynd gene were silenced by the ini- 
tial act of retroposition, whereas the C- 
terminus of the intronless Adh retrose- 
quence itself has been truncated by a 
frameshift with the resultant loss of 15 
codons, for which a single nucleotide 
insertion would appear to be responsi- 
ble. This loss of information is to be 
expected in a model of relaxed selec- 
tion. The distribution of nonsynony- 
mous changes in the Adh part of Jing- 
wei is also found to be relatively uni- 
form and not clustered in one 
particular region — the active site of 
Adh is indeed well preserved. This is 
either suggestive of widespread direc- 
tional selection or, rather, random 
degeneration and destabilization: i.e., a 
failure to preserve the integrity of the 
sequence. However, the actual situa- 
tion is likely to be more nuanced than 
either scenario would suggest. The 
introduction of deleterious changes 
could also set off a process and chain 
reaction of ensuing compensatory 
mutations observed in other genes [64, 
65] — a proclivity toward physical sta- 
bility being inherent in the nature of 
all proteins. Compensatory mutations 
would thus make up for any subopti- 
mal or potentially damaging amino 
acid replacements elsewhere in the 
sequence, as opposed to back muta- 
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tions that simply restore the ancestral 
residue. In this way, evolutionary diver- 
gence need not result in any change in 
the information content and function- 
ality even if the resultant peptide 
sequence is substantially altered. 
Moreover, the effect of compensation 
following partial degeneration would 
be indistinguishable from any func- 
tional innovation because both are 
beneficial. 

In vitro experiments, using a bacte- 
rial host species, appear to show that 
jingwei is a dehydrogenase dimer that 
catalyzes like Adh but with altered and 
diversified substrate binding activity 
and utilization [66, 67]. This is congru- 
ent with other research into the evolu- 
tion of duplicates, such as within the 
xanthine dehydrogenase family [68]. 
One possibility to account for this may 
be that the gene product folds abnor- 
mally and so has lost functional speci- 
ficity. In any case, as with all chimeric 
genes, jingwei has retained the core 
functionality of one or both of its 
parents but with a reduced pattern of 
expression. 

Retroposed Adh mRNA features in 
two other chimerical genes, Adh-Twain 
and Adh-Finnegan, where it has been 
inserted in different species of Dro- 
sophila. Interestingly, 230 of the 255 
residues contained in the correspond- 
ing Adh sequences are identical in 
Jingwei and both Adh-Twain and Adh- 
Finnegan. Begun and Jones [69] sug- 
gest that some sort of convergent ad- 
aptation could be at work, but that 
seems unlikely given that these genes 
have markedly different patterns of 
expression [70] — it is perhaps more 
reasonable to infer that the Adh part 
has undergone the same level of ini- 
tially relaxed selection followed by rep- 
arative compensation. The observed 
incidence of parallel evolution, as can 
be seen in Figure 3, something found 
to be relatively widespread in genetics 
[71], might be because of a common 
mutational susceptibility — for which 
the initial loss of introns associated 
with the Adh part [72] and need for 



priority readjustments may be a factor. 
Indeed, research tends to suggest that 
the presence of introns does have a 
significant effect on mRNA stability 
[73]. It is interesting that Begun and 
Jones infer a burst of evolutionary ac- 
tivity in the early stages but a noticea- 
ble slowing down later on. This is con- 
sistent with a model of initially relaxed 
selection in a population increasing in 
size following a bottleneck. The proba- 
bility of fixation in a diploid popula- 
tion is II2N for neutral alleles having 
no selective (dis) advantage, and so 
more likely to occur in a smaller set. 

3.4. Classical Duplication and 
Divergence 

Perhaps the best example of how 
duplication and classical evolutionary 
divergence can facilitate ecological ad- 
aptation is the unique case of con- 
certed evolution in colobine monkeys. 
The animals have adjusted to a pre- 
dominantly leaf-eating diet by evolving 
a variant pancreatic ribonuclease 
(pRNase) recruited to perform a partic- 
ular role as a digestive enzyme in fore- 
gut fermenters [74]. The data suggests 
that two pRNase paralogs (1A and IB), 
both 156 residues long, have been 
selected for in the colobine monkeys, 
with one adapting to its role with the 
loss of positive charge — namely argi- 
nine residues. In colobus polykomos, 
the number of acidic residues in this 
gene product has increased from 13 to 
15, whereas those for bases have 
decreased from 20 to 17. A test for 
selection revealed evidence in support 
of a partial gain in function. A total of 
15 bp substitutions were identified, 13 
of which replaced the ancestral amino 
acid. However, the number of transi- 
tions to transversions was as expected 
from neutral evolution (11:4), and only 
five of the residue changes were radi- 
cal ones. 

Therefore, this classical model of 
gene duplication, mutation, and natu- 
ral selection would appear to demon- 
strate how evolutionary processes can 
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FIGURE 3 



ADS 


MFDLTGKHVCYVADCGGIALETSKVI^dTKNIAKIAILQSTENPC^IAQLQSIKPSTQIFFWTro 


100 


JGW 


AFSLSKKWIFVAGLGGIGIJlTSKELVKRDLKNLVIIJJRIi^AAIAELKKINPK^ 


168 


ATW 


KLSLTNEHWFVAGLGCIQTOTSIffiLVKRDLKHLVIIJJRIEMP^ 


201 


AFG 


KDAIAGKNlWVAGLGCIGMDTSF£rVKHGPKNLIIIJ5KIDKPEAIEELKGI^SKTKVSFH^ 


166 


ADH 


IDATINTNLTGKMNTVATVLPYMDRKMGGTGGLIVNVTSVIGIJ5PSPVFCAYSA 


200 


JGW 


IEATIAVNYTGLVWTTTAIMEFWDKRKCGE'GGIICNIGSVTGS'NAJ YQVPVYSGTKAAWHFTSSLAK— IAPIT— GVTAYTVNPGITRTTLVQKFNSWLD 


2 68 


ATW 


IBRTIAVHYTGLVNTTTAIMEFTTOKRKCGFGGIICNIGSVTGFlffllYQVFVYSGSKAAVVireTSSl^ 


301 


Arc 


IERTVAVNFTGTWTTTAIMPYWDiaOTGGPGGVIAHICSVTGFNSIYQVPVYSASK^ 


2 61 


ADH 


YGQSFADRLBRAPCQSTSVCGQNIVNAIERSENGQIWIADKGGLELVKLHWYWHMADQFVHYMQSNDEEDQD 


272 


JGW 


VEPCVAKKIJJ^PTQEPIiACAEHFVKAIELNgNGALWKLDLGTLEAIKWTKHWDSGI 


323 


ATW 


VEEKVAEKLLEHPTQTTQQCGKMFVKAIEMNQNGALWKLDLGTLEPIKWTK 


352 


AFG 


VEPCVAQLLLAHPTQTTKQCAKSFVKAIKENKNGAIWKLDLGRLDAIKWIKHWDEHI 


321 


A comparison of retroposed Adh in both Jingewi (Jgw), Adh-Twain (Atw), and Adh-Finnegan (Afg) in different species of Drosophila. The ancestral 
sequence (Adh) is also given. Adh has evolved in a parallel fashion, where it has been inserted into a host gene by retroposition. A frameshift at the C- 
term has truncated the protein in Jgw and the others. Clearly, the insertion of the retroduplicated gene has resulted in a very similar pattern of molecular 
evolution within these separate species. This would suggest a mutational convergence that does involve adaptation other than compensation. 



modify and optimize existing informa- 
tion to meet new environmental pres- 
sures. However, this also shows how 
evolutionary divergence is limited and 
results in closely related and not 
entirely novel functions. This may also 
be true for the nuclear receptor family 
that are comprised of ligand-mediated 
regulators of gene expression. It is 
inferred that "molecular tinkering," 
entailing modifications in ligand speci- 
ficity due to subtle changes in the the 
ligand pocket, where the signaling 
compound binds, led to associations 
in various duplicate members with 
other hormones and signals [75]. 

Another interesting case of classical 
divergence within a gene family con- 
cerns the tetrameric oxygen-binding 
protein, hemoglobin, found in the red 
blood cells of vertebrates. Five variants 
of hemoglobin exist at the (3-globin 
locus cluster in both humans and 
chimpanzees, all under the control of 
single regulatory region [76]; each 
member is differentially expressed 
throughout the development of the or- 
ganism: Epislon [HBE], for example, is 
normally expressed only in the embry- 
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onic yolk sac. It is precisely for this 
reason that gene duplication may have 
been involved in the division and spe- 
cialization of the original functions of 
a gene divided among different paral- 
ogs — as the organism can not exactly 
wait for the gene pertinent to the next 
developmental stage of oxygen metab- 
olism to evolve. The five genes present 
at the locus (including two HBG var- 
iants) mare highly similar in sequence 
and could be the functional equiva- 
lents of alternatively spliced isoforms 
of the original gene. 

Indeed, there are reasonable 
grounds to suppose that gene duplica- 
tion and mutation may be functionally 
comparable with the action of alterna- 
tive splicing in general [77]. For exam- 
ple, in certain species of the genus 
Drosophila, an ancestral sex-biased 
gene, JanusA, uses alternative splicing 
to encode two slightly different pro- 
teins, one present in multiple tissues 
of both sexes and the other present 
only in sperm. Duplication of JanusA 
created JanusB, which then specialized 
to encode a sperm-specific protein 
very similar to the function of the for- 



mer spliced variant [78]. Therefore, in 
this situation, no new information was 
produced. 

Subfunctionalization, whereby the 
information content of a parent gene 
is differentially partitioned amongst its 
daughters, is believed to be a common 
occurrence among surviving duplicates 
[79] . Here, duplication allows the origi- 
nal functionality of a gene to be spread 
across more stretches of DNA, 
although conserving the basic informa- 
tion content contained in the ancestral 
sequence. Subfunctionalization consti- 
tutes a loss in functional redundancy, 
due to the combination of both com- 
plementary degeneration and stabiliz- 
ing selection, and helps explain why 
knocking out certain paralogs can have 
a harmful effect. However, the benefit 
of this is that a degree of functional 
specialization can be arrived at which 
can have gains in efficiency in certain 
circumstances .In baker's yeast, Sac- 
charomyces cerevisiae, two galactose 
regulatory genes {GAL1 and GAL3) are 
believed to have evolved from a single 
bifunctional gene in an ancestral spe- 
cies, resulting in greater flexibility [80] . 
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3.5. Duplication and Intragenic 
Amplification: The Case of an AFGP in 
Notonthenioids 

All of the examples above involve evo- 
lution within the existing kind as 
opposed to any divergence that would 
lead to the emergence of a new type of 
gene. The first clear attempt at 
explaining how an old protein gene 
could spawn a new gene coding for an 
entirely new protein, and with a dis- 
tinctly different function, is the case of 
a trypsinogen to antifreeze glycopro- 
tein (AFGP) conversion in the notothe- 
nioid species, Dissostichus mawsoni 
[81]. The ice-binding AFGP that circu- 
lates in the blood of the Antarctic fish 
enables them to avoid freezing in their 
perpetually icy environment. This cru- 
cial survival protein is believed to have 
evolved from a pancreatic trypsinogen- 
like protease — a digestive enzyme. 
Indeed, both proteins are observed to 
be biosynthesized and secreted in the 
pancreas [82], and this is reflected in 
the shared regulatory features found in 
the UTR and signal peptide. The AFGP 
is characterized by repeats of two 3- 
residue components: TAA and TPA. 
These comprise about 60% of the 362- 
residue protein, Dm3l, one member of 
the AFGP family. The reasons given for 
the possible origin of the AFGP from a 
protease ancestor are: 

i. Exon 1 (containing the secretory 
signal and 5'UTR) in both AFGP 
and trypsinogen genes is almost 
identical, as is the 3' UTR of both 
genes. 

ii. The sequence of intron 1 of the 
trypsinogen gene is included within 
as two parts within intron 1 of the 
AFGP gene. 

iii. A 9-nt element in the trypsinogen 
gene — acagcggca (TAA) — that strad- 
dles intron 1 and exon 2 comprises 
the main repeating unit of the 
AFGP gene. 

iv. The topological proximity of both 
genes on the same chromosome 
indicates the likelihood of tandem 
duplication. 



v. The discovery of a chimeric AFGP- 
pro tease gene (Dm7m) that may 
be intermediate [83]. 

Cheng et al. speculate that the an- 
cestral protease gene was converted 
into the AFGP through a process that 
involved four major steps: a bulk dele- 
tion, intronic (de novo) recruitment, 
repeated internal amplification, and 
finally illegitimate recombination. 
However, this proposed mechanism is 
unlikely to have occurred for the fol- 
lowing reasons: 

i. The authors readily acknowledge 
that the bulk deletion of four exons 
and four introns is not likely to be 
tolerated even in a redundant 
duplicate, as it results in an entirely 
nonfunctional copy. This would 
make it liable for complete disinte- 
gration by null mutations, and not 
for its apparently miraculous rein- 
carnation as an entirely new gene. 

ii. The AFGP promoter elements at 
the 5' flanking sequence upstream 
of exon 1 are believed to be differ- 
ent from those found in the tryp- 
sinogen gene. Both proteins are 
produced in different amounts and 
also expressed in a different man- 
ner. The proper function and 
behavior of the glycoprotein 
depends on changes made or 
added to the promoter sequences. 

iii. Intron 1 in Dm3l is 1908 bp long, 
whereas the corresponding one is 
238 bp in the trypsinogen gene. 
There is no explanation provided 
for this eightfold difference in size, 
and the additional sequence's 
intronic information, other than a 
huge insertion (e.g., a retrotranspo- 
son, for which there is no trace) or 
a case of repeated intronic amplifi- 
cation. 

iv. The authors propose, implausibly, 
that the repeating TAA and TPA 
elements — hardly a unique 
sequence — could have been pro- 
duced by successive polymerase 
replication slippage or unequal 



intragenic recombination dozens 
of times over. However, this pro- 
cess is both indiscriminate and 
inefficient [84], and there is no rea- 
son to suppose it would selectively 
and exactly repeat the 9-nt ele- 
ments, with no resultant frameshift 
causing a premature termination. 
The positioning of the proline resi- 
dues is important as far as protein 
stability and folding is concerned 
[85]. 

v. There is no origin given for the 
inclusion of the important spacer 
sequence elements — LIF/LNF/ 
FNF/LNL [86]— other than an 
unsubstantiated and unfalsifiable 
claim that they could have been 
introduced through a yet unspeci- 
fied "recombinatory event." There 
is also a nonhomogenous pattern 
of repetition observed that is not 
exactly what one would expect 
from successive amplification. 

A key problem associated with the 
Darwinian mechanism of evolution is 
that many of the putative incipient 
and intermediate stages in the devel- 
opment of a biological trait may not 
be useful themselves and may even be 
harmful. This is exactly the problem 
with Cheng's proposed conversion. The 
incipient stage consists of a bulk dele- 
tion that would be almost certainly 
selected against, despite it being in a 
gene copy, as the cistron's core infor- 
mation and any useful functional re- 
dundancy it may have offered, would 
have been entirely lost. The resultant 
protein would be liable to misfold any- 
way. It is also extremely problematic 
that the initial intronic recruitment 
and its subsequent amplification 
would have been in any way func- 
tional — as far as binding to ice crystals 
or glycosylation is concerned — or have 
any exaptive utility. The hypothesized 
metamorphosis would have required 
widespread and related changes that 
must have been coordinated and 
synchronized — and so representing 
something to the effect of a directional 
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saltation. However, this is not some- 
thing a blind, unsupervised process 
that can be achieved. It is, however, 
plausible to suggest that the common- 
ality shared between both genes at 
their respective termini is indicative of 
the possibility, at least, that the glyco- 
protein was derived from an ancestral 
protease template. 

Moreover, the antifreeze proteins 
that have been found in Arctic cod [87] 
are completely different in sequence 
and organization from their Antarctic 
cousins — this means that the same try- 
spinogen-like gene could not have 
been the ancestral gene in this case. 
Although this is passed off as evidence 
of "convergent evolution," this serves 
only to provide another problem as to 
how a gene believed to be of a more 
recent origin could have evolved. 

3.6. De Novo Recruitment Without 
Duplication 

Although duplication is central to the 
modern evolutionary synthesis, in 
recent years, the possibility that previ- 
ously extragenic, noncoding regions of 
DNA could be recruited wholesale to 
become translated as functioning pro- 
teins, as opposed to just minor exoni- 
zation observed in the formation of 
the amino end of the Sdic gene. This 
represents a return to the idea of the 
hopeful monster [88] at the molecular 
level. For example, such origination 
has been proposed in the case of the 
yeast gene BSC4 [89] (of unknown 
function); and the human up regulated 
gene CLLU1 [90] that is believed to 
have some role in pathogenesis of 
chronic lymphomatic leukemia and 
shares structural motifs with the cyto- 
kine, IL-4, that is used in the immune 
system [91]. In the case of CLLU1, a 
single nucleotide deletion of adenine 
in a stretch of DNA orthologous with 
chimpanzees has created a frameshift 
and expanded ORF, large enough to be 
fully functional when translated as a 
protein. However, this inference may be 
incorrect. Rather than the deletion cre- 



ating a new stretch of translated DNA, it 
is likely that a back mutation restored 
the original ORF that became essen- 
tially divided in two as a result of an 
insertion — a very common phenom- 
enon observed in indel-induced frame- 
shifts [92]. Thus, far from being a case 
of bulk de novo recruitment of ncDNA, 
CLLU1 in humans is a gene that may 
have been fully reactivated while still 
inactive in other primate lineages. The 
corresponding gene in chimpanzees, if 
transcribed and regulated, may still be 
partially functional as two potential 42- 
codon reading frames are preserved at 
either terminus. Thus, the de novo and 
fortuitous origination of entire reading 
frames may be a profound misinterpreta- 
tion of cases of pseudogenes being reacti- 
vated. 

Alternatively, functional sections of 
noncoding DNA, or perhaps even "dor- 
mant" reading frames, have become 
translated into proteins that perform a 
particular task. There is indeed evi- 
dence for the existence of ORFs within 
introns [93] and other regions of non- 
coding DNA [94] that may be the result 
of transposition events. However, 
another possibility is that instead 
"junk" sequences of ncDNA are acci- 
dentally transcribed and translated 
into nonfunctional products that are 
fixed by neutral evolution, and which 
serve no purpose, other than perhaps 
being assigned to the cell's garbage 
collection and recycling system. In any 
case, as a mechanism for the creation 
of novel motifs and protein domains, 
de novo recruitment of noncoding 
DNA would seem extremely improb- 
able and implausible. 

4. CONCLUSION 

Gene duplication and subsequent evo- 
lutionary divergence certainly adds to 
the size of the genome and in large 
measure to its diversity and versatility. 
However, in all of the examples given 
above, known evolutionary mecha- 
nisms were markedly constrained in 
their ability to innovate and to create 



any novel information. This natural 
limit to biological change can be 
attributed mostly to the power of puri- 
fying selection, which, despite being 
relaxed in duplicates, is nonetheless 
ever-present. The reason for this stabi- 
lization of function is not obvious, 
although the role of duplicates in com- 
pensating for deleterious loss of func- 
tion mutation at paralogous sites may 
be an important factor. Likewise, there 
exists a preservation of ancestral func- 
tions through the process of a differen- 
tial division of labor among duplicates, 
namely that of subfunctionalization. 
Moreover, both the possibility and op- 
portunity for beneficial changes lead- 
ing to major functional innovations 
was found to be not especially con- 
vincing. For example, duplicate 
enzyme-coding genes tend to retain 
the same ancestral catalytic activity 
and simply apply that function to dif- 
ferent substrates, often by partial deg- 
radation of function and the loss of 
the precise specificity of the parent. 
However, these may prove to have an 
important adaptive value in response to 
environmental challenges such as with 
respect to temperature, drought, patho- 
gens, and UV radiation. 

Where substantive sequence evolu- 
tion had occurred, it could have been 
because a respite in selective con- 
straints led to significant degeneration. 
In the case of Sdic and Jingwei, both 
genes evolved from duplicates affected 
by significant deletions or the silencing 
of exonic information and were then 
co-opted for use in a different context. 
This development has likely been mis- 
interpreted in many cases as evidence 
of a gain in information under positive 
Darwinian selection, especially when 
extensive compensatory changes are 
involved that can amplify sequence 
divergence in the process. In this 
sense, a proclivity toward functional 
stability and the conservation of infor- 
mation, as opposed to any adventur- 
ous innovation, predominates. 

The various postduplication mecha- 
nisms entailing random mutations and 
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recombinations considered were 
observed to tweak, tinker, copy, cut, 
divide, and shuffle existing genetic in- 
formation around, but fell short of 
generating genuinely distinct and 
entirely novel functionality. Contrary to 
Darwin's view of the plasticity of bio- 
logical features, successive modifica- 
tion and selection in genes does 
indeed appear to have real and inher- 
ent limits: it can serve to alter the 
sequence, size, and function of a gene 
to an extent, but this almost always 
amounts to a variation on the same 
theme — as with RNASE1B in colobine 
monkeys. The conservation of all-im- 
portant motifs within gene families, 
such as the homeobox or the MADS- 
box motif, attests to the fact that gene 
duplication results in the copying and 
preservation of biological information, 
and not its transformation as some- 
thing original. 



The case of evolution in notothe- 
nioid fish, entailing the speculative 
conversion of a protease duplicate into 
an AFGP, only serves to demonstrate 
the huge problem of supposing that 
cumulative random changes would 
contrive to produce novel information, 
especially if major deletions and other 
degenerative mutations were involved. 

Although the focus here has been 
on the information within exons that 
code for the amino acid sequences in 
proteins, noncoding DNA — which 
comprises the vast majority of the 
molecule — also contains information 
necessary for the regulation and 
expression of gene products. Changes 
in these regions can have a profound 
effect on an organism's evolution. But, 
although important, without a reper- 
toire of proteins with which to regu- 
late, this is ancillary in effect. For 
example, it is impossible for an orga- 



nism to develop vision without the 
exons coding for light-sensitive opsins 
or feathers for flight without the pres- 
ence of keratins in the skin. 

Gradual natural selection is no 
doubt important in biological adap- 
tation and for ensuring the robust- 
ness of the genome in the face of 
constantly changing environmental 
pressures. However, its potential for 
innovation is greatly inadequate as 
far as explaining the origination of 
the distinct exonic sequences that 
contribute to the complexity of the 
organism and diversity of life. Any 
alternative /revision to Neo-Darwin- 
ism [95] has to consider the holistic 
nature and organization of informa- 
tion encoded in genes, which specify 
the interdependent and complex bio- 
chemical motifs that allow protein 
molecules to fold properly and func- 
tion effectively. 
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